Learning Context for Text Categorization

نویسندگان

  • Yashodhara V. Haribhakta
  • Parag Kulkarni
چکیده

This paper describes our work which is based on discovering context for text document categorization. The document categorization approach is derived from a combination of a learning paradigm known as relation extraction and an technique known as context discovery. We demonstrate the effectiveness of our categorization approach using reuters 21578 dataset and synthetic real world data from sports domain. Our experimental results indicate that the learned context greatly improves the categorization performance as compared to traditional categorization approaches.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving the Operation of Text Categorization Systems with Selecting Proper Features Based on PSO-LA

With the explosive growth in amount of information, it is highly required to utilize tools and methods in order to search, filter and manage resources. One of the major problems in text classification relates to the high dimensional feature spaces. Therefore, the main goal of text classification is to reduce the dimensionality of features space. There are many feature selection methods. However...

متن کامل

Feature Selection for Effective Text Classification using Semantic Information

Text categorization is the task of assigning text or documents into pre-specified classes or categories. For an improved classification of documents text-based learning needs to understand the context, like humans can decide the relevance of a text through the context associated with it, thus it is required to incorporate the context information with the text in machine learning for better clas...

متن کامل

Context-sensitive Learning Methods for Text Categorization

Two recently implemented machine learning algorithms, RIPPER and sleeping experts , are evaluated on a number of large text categorization problems. These algorithms both construct classiiers that allow the \context" of a word w to aaect how (or even whether) the presence or absence of w will contribute to a classiication. However , RIPPER and sleeping experts diier radically in many other resp...

متن کامل

A framework for text categorization

The field of automatic Text Categorization (TC) concerns the creation of categorizer functions, usually involving Machine Learning techniques, to assign labels from a pre-defined set of categories to documents based on the documents’ content. Because of the many variations on how this can be achieved and the diversity of applications in which it can be employed, creating specific TC application...

متن کامل

Sampling Strategies and Learning Efficiency in Text Categorization

This paper studies training set sampling strategies in the context of statistical learning for text categorization. It is argued sampling strategies favoring common categories is superior to uniform coverage or mistake-driven approaches, if performance is measured by globally assessed precision and recall. The hypothesis is empirically validated by examining the performance of a nearest neighbo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1112.2031  شماره 

صفحات  -

تاریخ انتشار 2011